This report explores in depth numerous different White Wines.

Load the Packages

Univariate Plots Section

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"
##        X        fixed.acidity    volatile.acidity  citric.acid    
##  Min.   :   1   Min.   : 3.800   Min.   :0.0800   Min.   :0.0000  
##  1st Qu.:1225   1st Qu.: 6.300   1st Qu.:0.2100   1st Qu.:0.2700  
##  Median :2450   Median : 6.800   Median :0.2600   Median :0.3200  
##  Mean   :2450   Mean   : 6.855   Mean   :0.2782   Mean   :0.3342  
##  3rd Qu.:3674   3rd Qu.: 7.300   3rd Qu.:0.3200   3rd Qu.:0.3900  
##  Max.   :4898   Max.   :14.200   Max.   :1.1000   Max.   :1.6600  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.600   Min.   :0.00900   Min.   :  2.00     
##  1st Qu.: 1.700   1st Qu.:0.03600   1st Qu.: 23.00     
##  Median : 5.200   Median :0.04300   Median : 34.00     
##  Mean   : 6.391   Mean   :0.04577   Mean   : 35.31     
##  3rd Qu.: 9.900   3rd Qu.:0.05000   3rd Qu.: 46.00     
##  Max.   :65.800   Max.   :0.34600   Max.   :289.00     
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  9.0        Min.   :0.9871   Min.   :2.720   Min.   :0.2200  
##  1st Qu.:108.0        1st Qu.:0.9917   1st Qu.:3.090   1st Qu.:0.4100  
##  Median :134.0        Median :0.9937   Median :3.180   Median :0.4700  
##  Mean   :138.4        Mean   :0.9940   Mean   :3.188   Mean   :0.4898  
##  3rd Qu.:167.0        3rd Qu.:0.9961   3rd Qu.:3.280   3rd Qu.:0.5500  
##  Max.   :440.0        Max.   :1.0390   Max.   :3.820   Max.   :1.0800  
##     alcohol         quality     
##  Min.   : 8.00   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.40   Median :6.000  
##  Mean   :10.51   Mean   :5.878  
##  3rd Qu.:11.40   3rd Qu.:6.000  
##  Max.   :14.20   Max.   :9.000

This dataset includes 11 different input variables with over 4898 observations

of White Wines and 1 output variable (Quality).

Remove the unneeded variable X.

2 quick histogram charts to show the frequency of alcohol and citric acid in

the dataset.

These plots show a mostly normal distribution

These plots also show a mostly normal distribution.

The residual.sugar plot is skewed to the left (less sweet White Wines?)

and the alcohol plot is pretty spread out.

There is an interesting spike in citric acid around .5.

Univariate Analysis

What is the structure of your dataset?

The Dataset is made up of 4898 observation of White Wines with 11 inputs

(fixed acidity, volatile acidity, citric acid, residual sugar, chlorides,

free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol)

and 1 output (quality).

What is the main features of interest in your dataset?

I think the main features of this White Wine dataset are Alcohol(%) as

well as Residual Sugar. They are the 2 main variables that appear to not

have a normal distribution.

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?

Chlorides, Volatile Acidity, and total sulfur dioxide seem to play a smaller

part in the quality of the White Wine. Citric acid also has an interesting

spike around .5.

Did you create any new variables from existing variables in the dataset?

I created a new variable called quality_fac to aid in the factoring of the

quality of some of my plots and to show better visualizations of the data

in the Multivariate Section below.

Of the features you investigated, were there any unusual distributions?

Did you perform any operations on the data to tidy, adjust, or change the

form of the data? If so, why did you do this?

The only unusual distribution I found was sulphates which seemed to have a

minor bimodal distribution. They only change I made was the new variable as

mentioned above.

Biavariate Plots

Using GGpairs to see any apparent correlations with the data. There appears

to be multiple correlations between a number of variables in the dataset to

explore including density/residual sugar and alcohol/quality, among others.

The best quality White Wines seem to have a pH of 3.0 to 3.5, alcohol content

of between 10 and 13, medium to high levels of citric acid (.25-.5), and low

residual sugars(0 - 18).

They also seen to have a lower density, lower chlorides (.2 - .6), somewhat

lower amount of sulphates, and fixed acidity between 4 - 8.

There is definitely a linear relationship between density and total sulfur

dioxide as seen in the plot above as well as a negative linear relationship

between alcohol and density. After comparing density and residual sugar,

they appears to be a very strong linear relationship also.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in

the dataset?

The GGpairs plot was pretty interesting because it put everything together

and showed correlations between the variables. One of the biggest was the

correlation between density and residual sugars as well as density and

alcohol.

All the higher quality White Wines have a medium level of pH between 3-3.5,

higher level of alcohol content, mid-high citric acid level, lower residual

sugars, lower chlorides, lower density, and somewhat lower fixed acidity.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

appears the higher alcohol content it has the less dense the White Wine is.

What was the strongest relationship you found?

The strongest relationship appears to be between Density and Residual Sugar.

Mulivariate Plot Section

Adding factor to quality for ranking purposes.

The higher quality White Wines seem to have more alcohol and lower density

than the lower quality White Wines.

Both plots show no real correlation between sulphates and total or free

sulfur dioxide on the quality of White Wine. I was curious because sulphates

tend to contribute to sulfur dioxide levels according to the description of

attributes.

Volatile Acidity seems to play a role in the quality of White Wine as well

as Fixed Acidity to a lower extent.

Quality White Wines have an above average level of citric acid, lower level of

chlorides, higher level of alcohol, and medium to high level of fixed acidity

compared to the lower quality White Wines.

Once again the Quality of White Wines but this time with the Density and

Alcohol switched around showing the strong linear relationship as well as the

quality factor.

This is a better look at the different quality White Wines compared to

Density and Alcohol together and then seperate in order to see the

difference.

All three of these boxplots seems to back up my findings that higher citric

acid, lower density (higher alcohol content), and lower chlorides make a

better quality White Wine.

Density seems to be closely tied to the alcohol content as well as possibly

total sulfur dioxide. The higher the alcohol content the less dense the

White Wines appear to be.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms of

looking at your feature(s) of interest?

I was suprised to find that the higher quality White Wines seem to have a

higher alcohol content which in turn means a lower density. I thought that

the opposite would be true due to the taste of alcohol.

I was also suprised to find that the higher quality White Wines had a medium

to high level of citric acid as well as low levels of chlorides(salt).

Were there any interesting or surprising interactions between features?

I thought it was definitely interesting that as the alcohol content goes

up the density goes down.

OPTIONAL: Did you create any models with your dataset? Discuss the

strengths and limitations of your model.

I did not create a model.

Final Plots and Summary

Plot One

The Density and Residual Sugar of the White Wines have a strong linear

relationship as shown in the above plot

Plot Two

The higher quality White Wines have a higher alcohol content and lower

density than the lower quality White Wines. It also shows a strong linear

relationship between density and alcohol.

Plot Three

All three of these boxplots seems to back up my findings that higher citric

acid, lower density (higher alcohol content), and lower chlorides make a

better quality White Wine.

Reflection

This dataset contained 4898 observations of White Wine Quality with 11

inputs and 1 output. After exploring the data in detail I can say for certain

I know alot more about Wine than I have ever known. At first I was

concentrating strictly on what variables are needed to make a high quality

to alcohol content as well as residual sugars. The more dense the wine was

the less alcohol content it contained.

After examing the sulphates and sulfur dioxide I was very suprised to learn

they are not closely correlated as it mentioned in the description of

attributes that sulphates can contribute to sulfur dioxide gas levels. It

appears that density and total sulfur dioxide have a linear realtionship also

that could be futhur examined.

I think there is opportunity for furthur understanding of the makeup of

a good quality wine with more data on a wider number of white wines. Breaking

up the data into the 7 classes of whites would also allow you to gain more

understanding how the different variables that make up the wines react and

come together to form a high quality wine.